
            <!DOCTYPE html>
            <html lang="en">
            <head>
                <meta charset="UTF-8">
                <title>【Promethues 实战】监控 Kubernetes 集群节点</title>
            </head>
            <body>
            <a href="https://andyoung.blog.csdn.net">原作者博客</a>
            <div id="content_views" class="markdown_views prism-atom-one-light">
                    <svg xmlns="http://www.w3.org/2000/svg" style="display: none;">
                        <path stroke-linecap="round" d="M5,0 0,2.5 5,5z" id="raphael-marker-block" style="-webkit-tap-highlight-color: rgba(0, 0, 0, 0);"></path>
                    </svg>
                    <h2><a id="_Kubernetes__0"></a>监控 Kubernetes 集群节点</h2> 
<p>对于集群的监控一般我们需要考虑以下几个方面：</p> 
<ul><li>Kubernetes 节点的监控：比如节点的 cpu、load、disk、memory 等指标</li><li>内部系统组件的状态：比如 kube-scheduler、kube-controller-manager、kubedns/coredns 等组件的详细运行状态</li><li>编排级的 metrics：比如 Deployment 的状态、资源请求、调度和 API 延迟等数据指标</li></ul> 
<h3><a id="_8"></a>监控方案</h3> 
<p>Kubernetes 集群的监控方案目前主要有以下几种方案：</p> 
<ul><li> <p>Heapster：Heapster 是一个集群范围的监控和数据聚合工具，以 Pod 的形式运行在集群中。 <img src="https://i-blog.csdnimg.cn/blog_migrate/800160420d5e208c2f4e3f62f3b0b35e.png" alt="heapster"> 除了 Kubelet/cAdvisor 之外，我们还可以向 Heapster 添加其他指标源数据，比如 kube-state-metrics，我们会在下面和大家讲解的</p> 
  <blockquote> 
   <p>需要注意的是 Heapster 已经被废弃了，后续版本中会使用 metrics-server 代替。</p> 
  </blockquote> </li><li> <p>cAdvisor：<a href="https://github.com/google/cadvisor">cAdvisor</a>是<code>Google</code>开源的容器资源监控和性能分析工具，它是专门为容器而生，本身也支持 Docker 容器，在 Kubernetes 中，我们不需要单独去安装，cAdvisor 作为 kubelet 内置的一部分程序可以直接使用。</p> </li><li> <p>Kube-state-metrics：<a href="https://github.com/kubernetes/kube-state-metrics">kube-state-metrics</a>通过监听 API Server 生成有关资源对象的状态指标，比如 Deployment、Node、Pod，需要注意的是 kube-state-metrics 只是简单提供一个 metrics 数据，并不会存储这些指标数据，所以我们可以使用 Prometheus 来抓取这些数据然后存储。</p> </li><li> <p>metrics-server：metrics-server 也是一个集群范围内的资源数据聚合工具，是 Heapster 的替代品，同样的，metrics-server 也只是显示数据，并不提供数据存储服务。</p> </li></ul> 
<p>不过 kube-state-metrics 和 metrics-server 之间还是有很大不同的，二者的主要区别如下：</p> 
<ul><li>kube-state-metrics 主要关注的是业务相关的一些元数据，比如 Deployment、Pod、副本状态等</li><li>metrics-server 主要关注的是<a href="https://github.com/kubernetes/community/blob/master/contributors/design-proposals/instrumentation/resource-metrics-api.md">资源度量 API</a> 的实现，比如 CPU、文件描述符、内存、请求延时等指标。</li></ul> 
<h3><a id="_27"></a>监控集群节点</h3> 
<p>现在我们就来开始我们集群的监控工作，首先来监控我们集群的节点，要监控节点其实我们已经有很多非常成熟的方案了，比如 Nagios、zabbix，甚至我们自己来收集数据也可以，我们这里通过 Prometheus 来采集节点的监控指标数据，可以通过<a href="https://github.com/prometheus/node_exporter">node_exporter</a>来获取，顾名思义，node_exporter 就是抓取用于采集服务器节点的各种运行指标，目前 node_exporter 支持几乎所有常见的监控点，比如 conntrack，cpu，diskstats，filesystem，loadavg，meminfo，netstat等，详细的监控点列表可以参考其<a href="https://github.com/prometheus/node_exporter">Github repo</a>。</p> 
<p>我们可以通过 DaemonSet 控制器来部署该服务，这样每一个节点都会自动运行一个这样的 Pod，如果我们从集群中删除或者添加节点后，也会进行自动扩展。</p> 
<p>在部署 node-exporter 的时候有一些细节需要注意，如下资源清单文件：(prome-node-exporter.yaml)</p> 
<pre><code class="prism language-yaml"><span class="token key atrule">apiVersion</span><span class="token punctuation">:</span> apps/v1
<span class="token key atrule">kind</span><span class="token punctuation">:</span> DaemonSet
<span class="token key atrule">metadata</span><span class="token punctuation">:</span>
  <span class="token key atrule">name</span><span class="token punctuation">:</span> node<span class="token punctuation">-</span>exporter
  <span class="token key atrule">namespace</span><span class="token punctuation">:</span> kube<span class="token punctuation">-</span>ops
  <span class="token key atrule">labels</span><span class="token punctuation">:</span>
    <span class="token key atrule">name</span><span class="token punctuation">:</span> node<span class="token punctuation">-</span>exporter
<span class="token key atrule">spec</span><span class="token punctuation">:</span>
  <span class="token key atrule">selector</span><span class="token punctuation">:</span>
    <span class="token key atrule">matchLabels</span><span class="token punctuation">:</span>
      <span class="token key atrule">name</span><span class="token punctuation">:</span> node<span class="token punctuation">-</span>exporter
  <span class="token key atrule">template</span><span class="token punctuation">:</span>
    <span class="token key atrule">metadata</span><span class="token punctuation">:</span>
      <span class="token key atrule">labels</span><span class="token punctuation">:</span>
        <span class="token key atrule">name</span><span class="token punctuation">:</span> node<span class="token punctuation">-</span>exporter
    <span class="token key atrule">spec</span><span class="token punctuation">:</span>
      <span class="token key atrule">hostPID</span><span class="token punctuation">:</span> <span class="token boolean important">true</span>
      <span class="token key atrule">hostIPC</span><span class="token punctuation">:</span> <span class="token boolean important">true</span>
      <span class="token key atrule">hostNetwork</span><span class="token punctuation">:</span> <span class="token boolean important">true</span>
      <span class="token key atrule">containers</span><span class="token punctuation">:</span>
      <span class="token punctuation">-</span> <span class="token key atrule">name</span><span class="token punctuation">:</span> node<span class="token punctuation">-</span>exporter
        <span class="token key atrule">image</span><span class="token punctuation">:</span> prom/node<span class="token punctuation">-</span>exporter<span class="token punctuation">:</span>v0.16.0
        <span class="token key atrule">ports</span><span class="token punctuation">:</span>
        <span class="token punctuation">-</span> <span class="token key atrule">containerPort</span><span class="token punctuation">:</span> <span class="token number">9100</span>
        <span class="token key atrule">resources</span><span class="token punctuation">:</span>
          <span class="token key atrule">requests</span><span class="token punctuation">:</span>
            <span class="token key atrule">cpu</span><span class="token punctuation">:</span> <span class="token number">0.15</span>
        <span class="token key atrule">securityContext</span><span class="token punctuation">:</span>
          <span class="token key atrule">privileged</span><span class="token punctuation">:</span> <span class="token boolean important">true</span>
        <span class="token key atrule">args</span><span class="token punctuation">:</span>
        <span class="token punctuation">-</span> <span class="token punctuation">-</span><span class="token punctuation">-</span>path.procfs
        <span class="token punctuation">-</span> /host/proc
        <span class="token punctuation">-</span> <span class="token punctuation">-</span><span class="token punctuation">-</span>path.sysfs
        <span class="token punctuation">-</span> /host/sys
        <span class="token punctuation">-</span> <span class="token punctuation">-</span><span class="token punctuation">-</span>collector.filesystem.ignored<span class="token punctuation">-</span>mount<span class="token punctuation">-</span>points
        <span class="token punctuation">-</span> <span class="token string">'"^/(sys|proc|dev|host|etc)($|/)"'</span>
        <span class="token key atrule">volumeMounts</span><span class="token punctuation">:</span>
        <span class="token punctuation">-</span> <span class="token key atrule">name</span><span class="token punctuation">:</span> dev
          <span class="token key atrule">mountPath</span><span class="token punctuation">:</span> /host/dev
        <span class="token punctuation">-</span> <span class="token key atrule">name</span><span class="token punctuation">:</span> proc
          <span class="token key atrule">mountPath</span><span class="token punctuation">:</span> /host/proc
        <span class="token punctuation">-</span> <span class="token key atrule">name</span><span class="token punctuation">:</span> sys
          <span class="token key atrule">mountPath</span><span class="token punctuation">:</span> /host/sys
        <span class="token punctuation">-</span> <span class="token key atrule">name</span><span class="token punctuation">:</span> rootfs
          <span class="token key atrule">mountPath</span><span class="token punctuation">:</span> /rootfs
      <span class="token key atrule">tolerations</span><span class="token punctuation">:</span>
      <span class="token punctuation">-</span> <span class="token key atrule">key</span><span class="token punctuation">:</span> <span class="token string">"node-role.kubernetes.io/master"</span>
        <span class="token key atrule">operator</span><span class="token punctuation">:</span> <span class="token string">"Exists"</span>
        <span class="token key atrule">effect</span><span class="token punctuation">:</span> <span class="token string">"NoSchedule"</span>
      <span class="token key atrule">volumes</span><span class="token punctuation">:</span>
        <span class="token punctuation">-</span> <span class="token key atrule">name</span><span class="token punctuation">:</span> proc
          <span class="token key atrule">hostPath</span><span class="token punctuation">:</span>
            <span class="token key atrule">path</span><span class="token punctuation">:</span> /proc
        <span class="token punctuation">-</span> <span class="token key atrule">name</span><span class="token punctuation">:</span> dev
          <span class="token key atrule">hostPath</span><span class="token punctuation">:</span>
            <span class="token key atrule">path</span><span class="token punctuation">:</span> /dev
        <span class="token punctuation">-</span> <span class="token key atrule">name</span><span class="token punctuation">:</span> sys
          <span class="token key atrule">hostPath</span><span class="token punctuation">:</span>
            <span class="token key atrule">path</span><span class="token punctuation">:</span> /sys
        <span class="token punctuation">-</span> <span class="token key atrule">name</span><span class="token punctuation">:</span> rootfs
          <span class="token key atrule">hostPath</span><span class="token punctuation">:</span>
            <span class="token key atrule">path</span><span class="token punctuation">:</span> /
</code></pre> 
<p>由于我们要获取到的数据是主机的监控指标数据，而我们的 node-exporter 是运行在容器中的，所以我们在 Pod 中需要配置一些 Pod 的安全策略，这里我们就添加了<code>hostPID: true</code>、<code>hostIPC: true</code>、<code>hostNetwork: true</code>3个策略，用来使用主机的 PID namespace、IPC namespace 以及主机网络，这些 namespace 就是用于容器隔离的关键技术，要注意这里的 namespace 和集群中的 namespace 是两个完全不相同的概念。</p> 
<p>另外我们还将主机的<code>/dev</code>、<code>/proc</code>、<code>/sys</code>这些目录挂载到容器中，这些因为我们采集的很多节点数据都是通过这些文件夹下面的文件来获取到的，比如我们在使用<code>top</code>命令可以查看当前<code>cpu</code>使用情况，数据就来源于文件<code>/proc/stat</code>，使用<code>free</code>命令可以查看当前内存使用情况，其数据来源是来自<code>/proc/meminfo</code>文件。</p> 
<p>另外由于我们集群使用的是 kubeadm 搭建的，所以如果希望 master 节点也一起被监控，则需要添加相应的容忍，对于污点和容忍还不是很熟悉的同学可以在前面的章节中回顾下。</p> 
<p>然后直接创建上面的资源对象即可：</p> 
<pre><code class="prism language-shell">$ kubectl create <span class="token parameter variable">-f</span> prome-node-exporter.yaml
daemonset.extensions <span class="token string">"node-exporter"</span> created
$ kubectl get pods <span class="token parameter variable">-n</span> kube-ops <span class="token parameter variable">-o</span> wide
NAME                          READY     STATUS    RESTARTS   AGE       IP             NODE
node-exporter-jfwfv           <span class="token number">1</span>/1       Running   <span class="token number">0</span>          30m       <span class="token number">10.151</span>.30.63   node02
node-exporter-kr8rt           <span class="token number">1</span>/1       Running   <span class="token number">0</span>          30m       <span class="token number">10.151</span>.30.64   node03
node-exporter-whb7n           <span class="token number">1</span>/1       Running   <span class="token number">0</span>          20m       <span class="token number">10.151</span>.30.57   master
prometheus-8566cd9699-gt9wh   <span class="token number">1</span>/1       Running   <span class="token number">0</span>          4d        <span class="token number">10.244</span>.4.39    node02
redis-544b6c8c54-8xd2g        <span class="token number">2</span>/2       Running   <span class="token number">0</span>          23h       <span class="token number">10.244</span>.2.87    node03
</code></pre> 
<p>部署完成后，我们可以看到在3个节点上都运行了一个 Pod，有的同学可能会说我们这里不需要创建一个 Service 吗？我们应该怎样去获取<code>/metrics</code>数据呢？我们上面是不是指定了<code>hostNetwork=true</code>，所以在每个节点上就会绑定一个端口 9100，我们可以通过这个端口去获取到监控指标数据：</p> 
<pre><code class="prism language-shell">$ <span class="token function">curl</span> <span class="token number">127.0</span>.0.1:9100/metrics
<span class="token punctuation">..</span>.
node_filesystem_device_error<span class="token punctuation">{<!-- --></span>device<span class="token operator">=</span><span class="token string">"shm"</span>,fstype<span class="token operator">=</span><span class="token string">"tmpfs"</span>,mountpoint<span class="token operator">=</span><span class="token string">"/rootfs/var/lib/docker/containers/aefe8b1b63c3aa5f27766053ec817415faf8f6f417bb210d266fef0c2da64674/shm"</span><span class="token punctuation">}</span> <span class="token number">1</span>
node_filesystem_device_error<span class="token punctuation">{<!-- --></span>device<span class="token operator">=</span><span class="token string">"shm"</span>,fstype<span class="token operator">=</span><span class="token string">"tmpfs"</span>,mountpoint<span class="token operator">=</span><span class="token string">"/rootfs/var/lib/docker/containers/c8652ca72230496038a07e4fe4ee47046abb5f88d9d2440f0c8a923d5f3e133c/shm"</span><span class="token punctuation">}</span> <span class="token number">1</span>
node_filesystem_device_error<span class="token punctuation">{<!-- --></span>device<span class="token operator">=</span><span class="token string">"tmpfs"</span>,fstype<span class="token operator">=</span><span class="token string">"tmpfs"</span>,mountpoint<span class="token operator">=</span><span class="token string">"/dev"</span><span class="token punctuation">}</span> <span class="token number">0</span>
node_filesystem_device_error<span class="token punctuation">{<!-- --></span>device<span class="token operator">=</span><span class="token string">"tmpfs"</span>,fstype<span class="token operator">=</span><span class="token string">"tmpfs"</span>,mountpoint<span class="token operator">=</span><span class="token string">"/dev/shm"</span><span class="token punctuation">}</span> <span class="token number">0</span>
<span class="token punctuation">..</span>.
</code></pre> 
<p>当然如果你觉得上面的手动安装方式比较麻烦，我们也可以使用 Helm 的方式来安装：</p> 
<pre><code class="prism language-shell">$ helm <span class="token function">install</span> <span class="token parameter variable">--name</span> node-exporter stable/prometheus-node-exporter <span class="token parameter variable">--namespace</span> kube-ops
</code></pre> 
<h3><a id="_138"></a>服务发现</h3> 
<p>由于我们这里3个节点上面都运行了 node-exporter 程序，如果我们通过一个 Service 来将数据收集到一起用静态配置的方式配置到 Prometheus 去中，就只会显示一条数据，我们得自己在指标数据中去过滤每个节点的数据，那么有没有一种方式可以让 Prometheus 去自动发现我们节点的 node-exporter 程序，并且按节点进行分组呢？是有的，就是我们前面和大家提到过的<a href="https://andyoung.blog.csdn.net/article/details/126263009" rel="nofollow"><strong>服务发现</strong></a>。</p> 
<p>在 Kubernetes 下，Promethues 通过与 Kubernetes API 集成，目前主要支持5中服务发现模式，分别是：Node、Service、Pod、Endpoints、Ingress。</p> 
<p>我们通过 kubectl 命令可以很方便的获取到当前集群中的所有节点信息：</p> 
<pre><code class="prism language-shell">$ kubectl get nodes
NAME      STATUS    ROLES     AGE       VERSION
master    Ready     master    165d      v1.10.0
node02    Ready     <span class="token operator">&lt;</span>none<span class="token operator">&gt;</span>    85d       v1.10.0
node03    Ready     <span class="token operator">&lt;</span>none<span class="token operator">&gt;</span>    145d      v1.10.0
</code></pre> 
<p>但是要让 Prometheus 也能够获取到当前集群中的所有节点信息的话，我们就需要利用 Node 的服务发现模式，同样的，在 prometheus.yml 文件中配置如下的 job 任务即可：</p> 
<pre><code class="prism language-yaml"><span class="token punctuation">-</span> <span class="token key atrule">job_name</span><span class="token punctuation">:</span> <span class="token string">'kubernetes-nodes'</span>
  <span class="token key atrule">kubernetes_sd_configs</span><span class="token punctuation">:</span>
  <span class="token punctuation">-</span> <span class="token key atrule">role</span><span class="token punctuation">:</span> node
</code></pre> 
<p>通过指定<code>kubernetes_sd_configs</code>的模式为<code>node</code>，Prometheus 就会自动从 Kubernetes 中发现所有的 node 节点并作为当前 job 监控的目标实例，发现的节点<code>/metrics</code>接口是默认的 kubelet 的 HTTP 接口。</p> 
<p>prometheus 的 ConfigMap 更新完成后，同样的我们执行 reload 操作，让配置生效：</p> 
<pre><code class="prism language-shell">$ kubectl delete <span class="token parameter variable">-f</span> prome-cm.yaml
configmap <span class="token string">"prometheus-config"</span> deleted
$ kubectl create <span class="token parameter variable">-f</span> prome-cm.yaml
configmap <span class="token string">"prometheus-config"</span> created
<span class="token comment"># 隔一会儿再执行下面的 reload 操作</span>
$ kubectl get svc <span class="token parameter variable">-n</span> kube-ops
NAME            TYPE        CLUSTER-IP      EXTERNAL-IP   PORT<span class="token punctuation">(</span>S<span class="token punctuation">)</span>                          AGE
prometheus      NodePort    <span class="token number">10.102</span>.74.90    <span class="token operator">&lt;</span>none<span class="token operator">&gt;</span>        <span class="token number">9090</span>:30358/TCP                   5d
<span class="token punctuation">..</span><span class="token punctuation">..</span><span class="token punctuation">..</span>
$ <span class="token function">curl</span> <span class="token parameter variable">-X</span> POST <span class="token string">"http://10.102.74.90:9090/-/reload"</span>
</code></pre> 
<p>配置生效后，我们再去 prometheus 的 dashboard 中查看 Targets 是否能够正常抓取数据，访问<strong>任意节点IP:30358</strong>：</p> 
<p><img src="https://i-blog.csdnimg.cn/blog_migrate/f331a1ba569b55c4779acf029516e314.png" alt="prometheus nodes target">prometheus nodes target</p> 
<p>我们可以看到上面的<code>kubernetes-nodes</code>这个 job 任务已经自动发现了我们3个 node 节点，但是在获取数据的时候失败了，出现了类似于下面的错误信息：</p> 
<pre><code class="prism language-shell">Get http://10.151.30.57:10250/metrics: net/http: HTTP/1.x transport connection broken: malformed HTTP response <span class="token string">"<span class="token entity" title="\x15">\x15</span><span class="token entity" title="\x03">\x03</span><span class="token entity" title="\x01">\x01</span><span class="token entity" title="\x00">\x00</span><span class="token entity" title="\x02">\x02</span><span class="token entity" title="\x02">\x02</span>"</span>
</code></pre> 
<p>这个是因为 prometheus 去发现 Node 模式的服务的时候，访问的端口默认是<strong>10250</strong>，而现在该端口下面已经没有了<code>/metrics</code>指标数据了，现在 kubelet 只读的数据接口统一通过<strong>10255</strong>端口进行暴露了，所以我们应该去替换掉这里的端口，但是我们是要替换成<strong>10255</strong>端口吗？不是的，因为我们是要去配置上面通过<code>node-exporter</code>抓取到的节点指标数据，而我们上面是不是指定了<code>hostNetwork=true</code>，所以在每个节点上就会绑定一个端口<strong>9100</strong>，所以我们应该将这里的<strong>10250</strong>替换成<strong>9100</strong>，但是应该怎样替换呢？</p> 
<p>这里我们就需要使用到 Prometheus 提供的<code>relabel_configs</code>中的<code>replace</code>能力了，relabel 可以在 Prometheus 采集数据之前，通过Target 实例的 Metadata 信息，动态重新写入 Label 的值。除此之外，我们还能根据 Target 实例的 Metadata 信息选择是否采集或者忽略该 Target 实例。比如我们这里就可以去匹配<code>__address__</code>这个 Label 标签，然后替换掉其中的端口：</p> 
<pre><code class="prism language-yaml"><span class="token punctuation">-</span> <span class="token key atrule">job_name</span><span class="token punctuation">:</span> <span class="token string">'kubernetes-nodes'</span>
  <span class="token key atrule">kubernetes_sd_configs</span><span class="token punctuation">:</span>
  <span class="token punctuation">-</span> <span class="token key atrule">role</span><span class="token punctuation">:</span> node
  <span class="token key atrule">relabel_configs</span><span class="token punctuation">:</span>
  <span class="token punctuation">-</span> <span class="token key atrule">source_labels</span><span class="token punctuation">:</span> <span class="token punctuation">[</span>__address__<span class="token punctuation">]</span>
    <span class="token key atrule">regex</span><span class="token punctuation">:</span> <span class="token string">'(.*):10250'</span>
    <span class="token key atrule">replacement</span><span class="token punctuation">:</span> <span class="token string">'${1}:9100'</span>
    <span class="token key atrule">target_label</span><span class="token punctuation">:</span> __address__
    <span class="token key atrule">action</span><span class="token punctuation">:</span> replace
</code></pre> 
<p>这里就是一个正则表达式，去匹配<code>__address__</code>，然后将 host 部分保留下来，port 替换成了<strong>9100</strong>，现在我们重新更新配置文件，执行 reload 操作，然后再去看 Prometheus 的 Dashboard 的 Targets 路径下面 kubernetes-nodes 这个 job 任务是否正常了：</p> 
<p><img src="https://i-blog.csdnimg.cn/blog_migrate/2ce1ee12b8e552814c46eb8dc324dca9.png" alt="prometheus nodes target2">prometheus nodes target2</p> 
<p>我们可以看到现在已经正常了，但是还有一个问题就是我们采集的指标数据 Label 标签就只有一个节点的 hostname，这对于我们在进行监控分组分类查询的时候带来了很多不方便的地方，要是我们能够将<strong>集群中 Node 节点的 Label 标签</strong>也能获取到就很好了。</p> 
<p>这里我们可以通过<code>labelmap</code>这个属性来将 Kubernetes 的 Label 标签添加为 Prometheus 的指标标签：</p> 
<pre><code class="prism language-yaml"><span class="token punctuation">-</span> <span class="token key atrule">job_name</span><span class="token punctuation">:</span> <span class="token string">'kubernetes-nodes'</span>
  <span class="token key atrule">kubernetes_sd_configs</span><span class="token punctuation">:</span>
  <span class="token punctuation">-</span> <span class="token key atrule">role</span><span class="token punctuation">:</span> node
  <span class="token key atrule">relabel_configs</span><span class="token punctuation">:</span>
  <span class="token punctuation">-</span> <span class="token key atrule">source_labels</span><span class="token punctuation">:</span> <span class="token punctuation">[</span>__address__<span class="token punctuation">]</span>
    <span class="token key atrule">regex</span><span class="token punctuation">:</span> <span class="token string">'(.*):10250'</span>
    <span class="token key atrule">replacement</span><span class="token punctuation">:</span> <span class="token string">'${1}:9100'</span>
    <span class="token key atrule">target_label</span><span class="token punctuation">:</span> __address__
    <span class="token key atrule">action</span><span class="token punctuation">:</span> replace
  <span class="token punctuation">-</span> <span class="token key atrule">action</span><span class="token punctuation">:</span> labelmap
    <span class="token key atrule">regex</span><span class="token punctuation">:</span> __meta_kubernetes_node_label_(.+)
</code></pre> 
<p>添加了一个 action 为<code>labelmap</code>，正则表达式是<code>__meta_kubernetes_node_label_(.+)</code>的配置，这里的意思就是表达式中匹配都的数据也添加到指标数据的 Label 标签中去。</p> 
<p>对于 kubernetes_sd_configs 下面可用的标签如下： 可用元标签：</p> 
<ul><li>__meta_kubernetes_node_name：节点对象的名称</li><li><em>_meta_kubernetes_node_label</em>：节点对象中的每个标签</li><li><em>_meta_kubernetes_node_annotation</em>：来自节点对象的每个注释</li><li><em>_meta_kubernetes_node_address</em>：每个节点地址类型的第一个地址（如果存在） *</li></ul> 
<blockquote> 
 <p>关于 kubernets_sd_configs 更多信息可以查看官方文档：<a href="https://prometheus.io/docs/prometheus/latest/configuration/configuration/#" rel="nofollow">kubernetes_sd_config</a></p> 
</blockquote> 
<p>另外由于 kubelet 也自带了一些监控指标数据，就上面我们提到的<strong>10255</strong>端口，所以我们这里也把 kubelet 的监控任务也一并配置上：</p> 
<pre><code class="prism language-yaml"><span class="token punctuation">-</span> <span class="token key atrule">job_name</span><span class="token punctuation">:</span> <span class="token string">'kubernetes-nodes'</span>
  <span class="token key atrule">kubernetes_sd_configs</span><span class="token punctuation">:</span>
  <span class="token punctuation">-</span> <span class="token key atrule">role</span><span class="token punctuation">:</span> node
  <span class="token key atrule">relabel_configs</span><span class="token punctuation">:</span>
  <span class="token punctuation">-</span> <span class="token key atrule">source_labels</span><span class="token punctuation">:</span> <span class="token punctuation">[</span>__address__<span class="token punctuation">]</span>
    <span class="token key atrule">regex</span><span class="token punctuation">:</span> <span class="token string">'(.*):10250'</span>
    <span class="token key atrule">replacement</span><span class="token punctuation">:</span> <span class="token string">'${1}:9100'</span>
    <span class="token key atrule">target_label</span><span class="token punctuation">:</span> __address__
    <span class="token key atrule">action</span><span class="token punctuation">:</span> replace
  <span class="token punctuation">-</span> <span class="token key atrule">action</span><span class="token punctuation">:</span> labelmap
    <span class="token key atrule">regex</span><span class="token punctuation">:</span> __meta_kubernetes_node_label_(.+)

<span class="token punctuation">-</span> <span class="token key atrule">job_name</span><span class="token punctuation">:</span> <span class="token string">'kubernetes-kubelet'</span>
  <span class="token key atrule">kubernetes_sd_configs</span><span class="token punctuation">:</span>
  <span class="token punctuation">-</span> <span class="token key atrule">role</span><span class="token punctuation">:</span> node
  <span class="token key atrule">relabel_configs</span><span class="token punctuation">:</span>
  <span class="token punctuation">-</span> <span class="token key atrule">source_labels</span><span class="token punctuation">:</span> <span class="token punctuation">[</span>__address__<span class="token punctuation">]</span>
    <span class="token key atrule">regex</span><span class="token punctuation">:</span> <span class="token string">'(.*):10250'</span>
    <span class="token key atrule">replacement</span><span class="token punctuation">:</span> <span class="token string">'${1}:10255'</span>
    <span class="token key atrule">target_label</span><span class="token punctuation">:</span> __address__
    <span class="token key atrule">action</span><span class="token punctuation">:</span> replace
  <span class="token punctuation">-</span> <span class="token key atrule">action</span><span class="token punctuation">:</span> labelmap
    <span class="token key atrule">regex</span><span class="token punctuation">:</span> __meta_kubernetes_node_label_(.+)
</code></pre> 
<p><strong>特别需要注意</strong>的是 Kubernetes 1.11+ 版本以后，kubelet 就移除了 10255 端口， metrics 接口又回到了 10250 端口中，所以这里不需要替换端口，但是需要使用 https 的协议。所以如果你使用的是 Kubernetes 1.11+ 版本的化，需要讲上面的 <code>kubernetes-kubelet</code> 任务替换成下面的配置：</p> 
<pre><code class="prism language-yaml"><span class="token punctuation">-</span> <span class="token key atrule">job_name</span><span class="token punctuation">:</span> <span class="token string">'kubernetes-kubelet'</span>
  <span class="token key atrule">kubernetes_sd_configs</span><span class="token punctuation">:</span>
  <span class="token punctuation">-</span> <span class="token key atrule">role</span><span class="token punctuation">:</span> node
  <span class="token key atrule">scheme</span><span class="token punctuation">:</span> https
  <span class="token key atrule">tls_config</span><span class="token punctuation">:</span>
    <span class="token key atrule">ca_file</span><span class="token punctuation">:</span> /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    <span class="token key atrule">insecure_skip_verify</span><span class="token punctuation">:</span> <span class="token boolean important">true</span>
  <span class="token key atrule">bearer_token_file</span><span class="token punctuation">:</span> /var/run/secrets/kubernetes.io/serviceaccount/token
  <span class="token key atrule">relabel_configs</span><span class="token punctuation">:</span>
  <span class="token punctuation">-</span> <span class="token key atrule">action</span><span class="token punctuation">:</span> labelmap
    <span class="token key atrule">regex</span><span class="token punctuation">:</span> __meta_kubernetes_node_label_(.+)
</code></pre> 
<p>现在我们再去更新下配置文件，执行 reload 操作，让配置生效，然后访问 Prometheus 的 Dashboard 查看 Targets 路径：</p> 
<p><img src="https://i-blog.csdnimg.cn/blog_migrate/20b8b32fc45fe2fb35bb44475c09d195.png" alt="prometheus node targets"></p> 
<p>现在可以看到我们上面添加的<code>kubernetes-kubelet</code>和<code>kubernetes-nodes</code>这两个 job 任务都已经配置成功了，而且二者的 Labels 标签都和集群的 node 节点标签保持一致了。</p> 
<p>现在我们就可以切换到 Graph 路径下面查看采集的一些指标数据了，比如查询 node_load1 指标：</p> 
<p><img src="https://i-blog.csdnimg.cn/blog_migrate/4075061e9a75fad9c71df8b033c4af07.png" alt="prometheus nodes graph1"></p> 
<p>我们可以看到将3个 node 节点对应的 node_load1 指标数据都查询出来了，同样的，我们还可以使用 PromQL 语句来进行更复杂的一些聚合查询操作，还可以根据我们的 Labels 标签对指标数据进行聚合，比如我们这里只查询 node03 节点的数据，可以使用表达式<code>node_load1{instance="node03"}</code>来进行查询：</p> 
<p><img src="https://i-blog.csdnimg.cn/blog_migrate/b8da00ed97a92d5f1cde9b002dc4a4d5.png" alt="prometheus nodes graph2"></p> 
<p>到这里我们就把 Kubernetes 集群节点的使用 Prometheus 监控起来了，下节课我们再来和大家学习怎样监控 Pod 或者 Service 之类的资源对象。</p> 
<h2><a id="_298"></a>总结</h2> 
<p>监控 Kubernetes 集群节点 还是挺麻烦的，不然不想这么麻烦可以使用 【Prometheus Operator】一条龙安装Prometheus 监控Kubernetes 集群，具体参考：<a href="https://andyoung.blog.csdn.net/article/details/127965449" rel="nofollow">Prometheus Operator 极简配置方式在k8s一条龙安装Prometheus 监控</a></p>
                </div>
            </body>
            </html>
            